A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction

نویسندگان

  • Jakob V. Hansen
  • Anders Krogh
چکیده

Ensemble methods, which combine several classifiers, have been successfully applied to decrease generalization error of machine learning methods. For most ensemble methods the ensemble members are combined by weighted summation of the output, called the linear average predictor. The logarithmic opinion pool ensemble method uses a multiplicative combination of the ensemble members, which treats the outputs of the ensemble members as independent probabilities. The advantage of the logarithmic opinion pool is the connection to the KullbackLeibler error function, which can be decomposed into two terms: An average of the error of the ensemble members, and the ambiguity. The ambiguity is independent of the target function, and can be estimated using unlabeled data. The advantage of the decomposition is that an unbiased estimate of the generalization error of the ensemble can be obtained, while training still is on the full training set. These properties can be used to improve classification. The logarithmic opinion pool ensemble method is tested on the prediction of protein secondary structure. The focus is on how much improvement the general ensemble method can give rather than on outperforming existing methods, because that typically involves several more steps of refinement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Accuracy in Predicting Secondary Structure of Ionic Channels

Ionic channels are among the most difficult proteins for experimental structure determining, very few of them has been resolved. Bioinformatical tools has not been tested for this specific protein group. In the paper, prediction quality of ionic channel secondary structure is evaluated. The tests were carried out with general protein predictors and predictors only for transmembrane segments. Th...

متن کامل

Hybrid fold recognition: combining sequence derived properties with evolutionary information.

Recent assessments of structure prediction have demonstrated that (i) although fold recognition methods can often identify remote similarities when standard sequence search methods fail, the score of the top-ranking fold is not always significant enough to allow a confident prediction; (ii) the use of structural information such as secondary structure increases recognition accuracy; (iii) moder...

متن کامل

Combining evolutionary and structural information for local protein structure prediction.

We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequenc...

متن کامل

Combining Artificial Neural Networks and GOR-V Information Theory to Predict Protein Secondary Structure from Amino Acid Sequences

Protein secondary structure prediction is a fundamental step in determining the 3D structure of a protein. In this paper, a new method for predicting protein secondary structure from amino acid sequences has been proposed and implemented. Cuff and Barton 513 protein data set is used in training and testing the prediction methods under the same hardware, platforms, and environments. The newly de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000